Yee Gin Woon's profile

The Analysis of Energy Consumption on the Environment

Analysing the effect of Energy Consumption on the Environment
This project differs from my past exploration. Instead of coming up with a product to address a problem, I am evaluating on the environmental issue using R Programming with three dependencies in the R library: car, stargazer and htmlTable. In contrast to the conventional qualitative study on social issues, I am using quantitative methods to analyse the extent of energy consumption on the environment.  
Introduction
Erratic weather, rising water level and melting glacier from the increasingly warm temperature has brought attention to the damaging effects of global warming on ecosystem. Energy consumption has a huge contribution to total emissions of greenhouse gases, indicating that the energy usage and effects of climate change are closely intertwined. Looking at existing research findings and study, I would explore variables using statistical methods to analyse the link between the energy consumption and the environment. The data was taken from the World Bank and information was extracted across different nations in 2010. 

The connection between energy consumption and carbon dioxide (CO2) emissions was studied, together with confounding variables in a cross-sectional analysis. The paper attempts to study the data visually and apply a suitable model, such as multiple regression to conclude the findings and provide possible recommendations.
Background
Globalisation drives nations to expand economically and become populous, which attributes to the rising energy consumption. Since the 1980s, people have been moving towards modernization, posing threats to the environment. Global warming is still prevalent today. The prolonged heat and greenhouse gases trapped in the atmosphere have exacerbated the living conditions. CO2 is one of the greenhouse gases that comes in two forms, either natural or man-made. While natural CO2 can be balanced out by the nature, the massive amount of CO2 generated by human activities annually makes it hard to minimise the emissions to the atmosphere. As the impact of climate change has become more salient in recent years, the increasing concerns and awareness about the climate issues have propagated to the notions of mitigating harm on the environment. 

The indicator of CO2 is commonly used by scholars and here the focus is on human-generated CO2 emissions from the World Development Indicators (WDI). As opposed to the typical natural CO2 gas, the CO2 emissions level varies according to the humans' energy usage. Indeed, energy utilisation has doubled over the decade, accounting for 1.7% surge of CO2. Besides the energy consumption, a theory suggests that the growing economy has an adverse effect on CO2 emissions, though no causality between energy and economic growth. The economic indicator, Gross Domestic Product (GDP) has an indirect impact on the environment stemming from the growing consumption of commodities that dissipates energy. Additionally, population size is another factor of climate change. There are also suggestions that smaller populations in the developed landscape lower CO2 emissions from findings, such as a city's tendency to consume more energy than a countryside area although no critical difference of the CO2 emissions between the two regions. This also brings the point that land usage is a contributing factor to CO2 emission via deforestation, rearing of livestock and agriculture. This research includes the digitalisation aspect as the 21st century is more technologically advanced and serves as a bridge to the existing research gap. Therefore, data from various nations in 2010 is drawn out to look at how energy and other variables affect the environment.
Variables
A regression model frames the influence of energy consumption on the environmental impact in terms of CO2 emissions. We will look at the dependent variable and independent variable while minimising the endogeneity in the equation. The following would list the types of variables involved in this paper.  

Dependent Variable
The measured indicator for the dependent variable is CO2 emissions (kt) to represent the greenhouse impact on the environment. Instead of using CO2 (in billions of metric ton), logarithmic (Log) function is applied to achieve a normal distribution. Without Log, the distribution of the histogram is skewed to the right.​​​​​​​
Histogram provides the frequency distribution of the CO2 data. The Indicator of CO2 emissions is interpreted by percentage change and has a mean of 9.41%.
The box plot shows the spread of the data with the mean approximately at 9%, its quartile range within the edge of the box, while the outlier is outside the box and its whisker.
Independent Variable
For the main argument, independent variable is the primary energy consumption (thousands of coal-ton equivalents). Like the dependent variable, the original data for the independent variable, which is the energy consumption is skewed, and applying the Log function helped to study the trend. The measurement is not in billions of coal-ton but in terms of percentage change.
Scatter Plot of the Log function shows normal distribution of energy consumption.
Histogram was employed for data visualisation. The independent variable is normally distributed with a mean of approximately 8.87%.
Both the independent variable and dependent variable used the Log function. The general assumption is that dependent variable and independent variable are positively correlated.

Control Variable
The remaining variables are the control variables. They are considered as the confounding variables but controlled to prevent distortion of the independent variable. Hence, there is no causal relationship between the control variables and an independent variable.  

(1) GDP (constant 2010 US$)

Real GDP indicates the implication of economic growth on the dependent variable. There is a general positive trend between GDP and CO2 emissions.

(2)Population (Total) 
Population is also positively correlated to CO2 as fewer humans on earth can reduce the energy produced, which is released as CO2.  

(3) Arable land (% of land area)  
Arable land measures the proportion of convertible lands for agricultural purposes. Increasing agricultural lands reduce CO2 emissions. Hence, arable land and CO2 emissions have a negative correlation.

(4) Mobile cellular subscriptions (per 100 people)  
Mobile cellular subscriptions represent the internet connectivity. The high value of subscriptions is a sign that the country is more digitised and technologically reliant. It is likely that the CO2 emissions increase from the rising mobile cellular subscriptions.​​​​​​​ 

Hypothesis

From the discussion of variables, testable hypotheses were established.

H0: There is no statistically significant relationship between energy and consumption.

H1: For a rise in energy consumption, the increase of CO2 emissions are statistically significant.

H2: The CO2 emissions are projected to rise due to increasing energy consumption while keeping GDP in control.

H3: An increase in energy consumption is controlled with GDP and population variables that lead to an increase in CO2 emissions.

H4: When arable land is factored into H3, the CO2 emissions are likely to fall with increasing energy consumption as more agriculture lands imply reduced CO2 emissions from photosynthesis.

H5: The increase in energy consumption is expected to be statistically significant with rising CO2 emissions while the control variables are GDP, population and mobile.

H6: The increase of CO2 emissions is statistically significant as the energy consumption increase with control variables such as GDP, population, arable land and mobile.

Empirical Analysis
Data was studied in detail to understand the interactions between variables and regression analysis was conducted to provide insightful findings for the proposed hypotheses.
Descriptive Statistics
Conditional statements were applied to give an idea of the distribution of CO2 emissions for different regions. High emissions refer to regions with the number of countries greater than the medium value, whereas low emissions are countries in regions below the medium. 
The scatter plot shows the average of the emissions is 9.41, marked with the horizontal line. Each point is denoted by the countries’ abbreviated form. We can see that large nations like China and the US are major energy consumers and generators of CO2 emissions.
Scatter Plot of Nation (in Abbreviations) in a Linear Regression
The best fit line of the linear regression line is plotted with the estimated equation of ŷ=1.78+0.85x. Coloured points and a legend are adopted to visualise the trend of the bivariate data in the various regions. I like that R has a simple function for programmer to plot points in different colours based on the labelled categories — types of regions. The presentation of data remains rather neat too. 
Scatter Plot for Linear Regression
The lm function in R provides the best fit line for the linear regression. A 1% increase in energy consumption contributes to an increase of 0.85% CO2 emissions. The energy consumption is proven to be statistically significant with p-value less than 5% significant level. Hence, energy consumption and CO2 emissions are highly correlated. However, correlation does not equate to causation and there are other factors affecting the CO2 emissions, but are indirectly linked to the energy consumption. They are the control variables, namely GDP, Population, Arable land, and Mobile Cellular Subscriptions. 
Summary of Linear Regression
Since there are different variables involved, multi-linear regression model was adopted. To understand the relation of each component, partial regression was applied, which all of the variables were distinctively plotted with CO2 emissions while keeping the other variables controlled. The distribution was noticed to be skewed when the plots of GDP and Population was conditioned with the other variables and thus the Log function was applied on them to achieve a comprehensible representation of the dependent variables and covariates in the multiple regression.
Applied Log on Partial Regression Model
The general equation of the multiple regression is the following:
Equation of the multiple regression
Different hypotheses were tested. The link between energy consumption and CO2 is statistically significant with positive correlation, rejecting the primary hypothesis, H0. H1 to H6 were satisfied as increasing energy consumption leads to increasing CO2 emissions. The only exception was H4 as the arable factor has a negligible effect on emissions in the partial regression.


For a well-defined non-spurious relationship model, omitted variable bias and multicollinearity were validated. Omitted variable bias identifies any hidden variables in error term, and multicollinearity arises from overloading of the independent variables that causes an overfitted model. The slope of energy consumption was initially 0.85 for the linear regression model. Model 2 was tested for H2 shifted and the slope falls to 0.35 — positive bias. However, both the slope of Model 3 and 4 to validate H3 and H4 are about 0.44 gradient for energy consumption — negative bias. It was thought multicollinearity was the reason for the model's inefficiency, but it did not apply when the slope value declined again for Model 5 and Model 6 when H5 and H6 were tested. The slope gradient of Model 5 and 6 is closer to the value of Model 2 which is approximately 0.38. Upon examining the six models, Model 2, 5 and 6 appear to be good models.


Adjusted R-squared is another approach to determine the most suitable model and the higher value indicates a better fit for the model. Model 5 is the best as it has the highest adjusted R-squared value among the others. Model 6, on the other hand, was thought to be the most optimal model, but proven otherwise. Adding arable land variable in Model 6 lowers the adjusted R-squared value, and thus the model is not suitable. The energy consumption in Model 5 can explain the variance of CO2 emission at 93.05%. At 95% confident level, we assume significant, α at 5%, concluding statistical significance when p-value <0.05. Therefore, the result is proven statistically significant as p-value is 2.2x10^-6 . ​​​​​​​The selected multiple regression is Model 5.

Selected model for multiple regression
Model Diagnosis

Model Diagnosis ensures no violation of linear regression, and the assumption is satisfied by LINE, which stands for Linearity, Independence, Normality and Equal Variance. Model 5 was applied to the model diagnosis.
Test for Linearity, Independence and Equal Variance 
Residuals vs Fitted Plot checks the assumption for a multiple regression — Model 5. The red line of the Residuals vs Fitted graph has a relatively flat line and hence it is linear. No specific pattern is shown in the spread of fitted value, which satisfies the independence criteria. The bound of residuals across fitted values has no sign of convergence or divergence, but randomly distributed. No heteroscedasticity but equal variance is observed.
Residuals vs Fitted Plot
Test for Normality
Normal Q-Q Plot tests the normality of the curve. The residuals are approximately equal to expected values and represented by the straight line slope. Hence, the normality assumption is true. The outlier in the plot is not salient and did not affect the accuracy of the regression model.
Normal Q-Q Plot
The regression’s assumption for LINE fulfils and hence Model 5 is the best linear unbiased estimator.

Residuals vs Leverage graph presents the residuals within the boundaries of the Cook’s distance. The leverage examines if the observation is unusual with respect to the multiple regression. There was no extraordinary finding as the plotted positions lie within Cook’s distance and removal of outlier is unnecessary, thus Model 5 does not require much fine-tuning. It is a good fit.
Residuals vs Leverage Plot (Cook's Distance)
Interpretation of the result 

Stargazer offers an organised summary of regressions results taken from hypothesised models. From the summaries and results, Model 5 is the best fitted as assessed from the model choices and diagnosis sections.
Regression Results
For Model 5, if there is no energy consumption, carbon emission is at -7.805. If energy consumption increases by 1%, CO2 emissions are expected to increase by 0.383% independent of GDP, Population and Mobile variables. The high adjusted R-squared explains the variance and p-value lower than 5% significant level substantiates the model’s suitability to explain the pattern between energy consumption and carbon emissions on the environment. 
Summary of Statistical Data for Model 5
Potential Concerns

Although a high adjusted R-squared value explains the relation between energy consumption and CO2 emissions, not every control variable can be accounted for. Energy consumption is the primary sources to make energy available, but energy pumped into industries does not equate to the proportion of CO2 emissions. It is hard to pin down the major causes of energy consumption contributing to the emission. Disaggregating energy consumption into demographics or work nature for future research is recommended. Another issue is the impracticality to include all possible components that increase or reduce CO2 emissions as it introduces multicollinearity, which makes the model inefficient. The present mitigation measures are also great factors to examine the influence and feasibility of reducing CO2 emissions. Data from 2010 alone, is not a good judgement as there are new measures to curb climate effects in the coming years. For instance, the Paris Agreement, COP26 and even the deployment of renewable energy are presently introduced to lessen the greenhouse effects. Comparison with past data and latest data can give a more robust finding for underlying issues of CO2 emissions. Through the differences observed over the years, effective policies can be implemented and then conclusion about the progress of decarbonisation can be made. CO2 may also not be a good indicator to define environment degradation as there are other components such as methane, nitrogen dioxides that adds up to the greenhouse effects. It would be good to include total greenhouse gases to understand the true extent of global warming from the energy consumption.
Conclusion

The positive correlation indicates that the energy consumption has an impact on the CO2 emissions. The findings accurately depict that high energy consumption from huge nations generate a massive amount of gases. Large nations like China, the US, Russia, Japan emit more CO2 gases from energy accumulated from the activities of the vast population. Conversely, nations with CO2 emissions below the median are thought to be less developed and usually organise fewer human activities, hence release lesser CO2 gases. These findings tie with existing study that energy consumption, huge population and good GDP of a developed nation have the potential to aggravate global warming conditions. The new variable introduced in the study — mobile cellular subscriptions — implies that digitalisation has a negative impact on the environment and requires moderation, given the world is advancing rapidly towards technological innovation. The statistically significance relationship between energy consumption and CO2 emissions of Model 5 is a key reason to enforce cooperation to reduce energy consumption, especially for the US and China. They are still major emitters of greenhouse gases today. Disaggregating the energy consumption, which is mentioned as one of the potential concerns, should be considered for policymaking to alleviate extensive emissions of CO2 gases. Placing a limit on countries is easy, but identifying the source of CO2 emissions help policymakers implement appropriate policies to address the issue to key parties efficiently. 

Seeking for international cooperation is not easy when the various nations have different priorities in their respective developments, which is also seen from the stalemating negotiations of the several climate summits. It is imperative for mankind to care for the planet. The failure to do so would threaten our livelihood in the near future. We are already witnessing the problem with food productions, extinction of species, increasing natural hazards. Therefore, let's work towards net zero carbon emissions by 2050 to save the planet. 
References

Acheampong, A. O. (2018). Economic growth, CO2 emissions and energy consumption: What causes what and where? Energy Economics, 74, 677–692. https://doi.org/10.1016/j.eneco.2018.07.022

Bove, Tristan. (2021, January 7). How GDP negatively affects climate change policy. Earth.Org - Past | Present | Future. https://earth.org/gdp-climate-change/

Chun-sheng, Z., Shu-wen, N., & Xin, Z. (2012). Effects of household energy consumption on environment and its influence factors in rural and urban areas. Energy Procedia, 14, 805–811. https://doi.org/10.1016/j.egypro.2011.12.1015

Dincer, I. (1998). Energy and environmental impacts: Present and future perspectives. Energy Sources, 20(4–5), 427–453. https://doi.org/10.1080/00908319808970070

Global energy & CO2 status report 2019 – analysis. (2019, March). IEA. https://www.iea.org/reports/global-energy-co2-status-report-2019

Jackson, R. (n.d.). The effects of climate change. Climate Change: Vital Signs of the Planet. Retrieved September 21, 2021, from https://climate.nasa.gov/effects/

Karlsruher Institut für Technologie (KIT). (2018, July 5). Expansion of agricultural land reduces carbon dioxide absorption. ScienceDaily. Retrieved September 21, 2021 from www.sciencedaily.com/releases/2018/07/180705115614.htm

Main sources of carbon dioxide emissions. (n.d.). CO2 Human Emissions. Retrieved September 21, 2021, from https://www.che-project.eu/news/main-sources-carbon-dioxide-emissions

Shaftel, H. (n.d.). Overview: Weather, global warming and climate change. Climate Change: Vital Signs of the Planet. Retrieved September 22, 2021, from https://climate.nasa.gov/resources/global-warming-vs-climate-change/

Smaller families most effective action on global warming. (2017, July 21). Population Matters. https://populationmatters.org/news/2017/07/smaller-families-most-effective-action-global-warming

Stephenson, M. (2008). Energy and Climate Change - An overview. ScienceDirect Topics. https://www.sciencedirect.com/topics/earth-and-planetary-sciences/energy-and-climate-change

Stern, T. (2020, September 14). Can the United States and China reboot their climate cooperation? Brookings. https://www.brookings.edu/articles/can-the-united-states-and-china-reboot-their-climate-cooperation/

UNCC. (n.d.).  Retrieved September 22, 2021, from https://unfccc.int/topics/land-use/the-big-picture/introduction-to-land-use
The Analysis of Energy Consumption on the Environment
Published:

Owner

The Analysis of Energy Consumption on the Environment

Published: